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! Abstract 

Hybrid-Lambda is a software package that simulates gene trees under Kingman or two 
Lambda-coalescent processes within species networks or species trees. It is written in C++, and re- 
leased under GNU General Public License (GPL) version 3. Users can modify and make new dis- 
tribution under the terms of this license. For details of this license, visit http : //www. gnu. org/licenses/. 
Hybrid-Lambda is available at https://code.google.eom/p/hybrid-lambda. 
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Species trees describe ancestral relations among species. Gene trees describe the random ances- 
tral relations of alleles sampled within species. Gene trees and species trees are often assumed to 
be bifurcating (Degnan and Salter, 2005; Hudson, 1990; Kingman, 1982). However, for organisms 
\ exhibiting sweepstakes reproduction, such as oysters and other marine organisms (E Arnason, 2004; 

Beckenbach, 1994; Eldon, 2011; Eldon and Wakeley, 2006; Hedgecock, 1994; Hedgecock et al, 1982; 
Sargsyan and Wakeley, 2008), the Kingman coalescent may not be appropriate, as it only allows 
binary mergers of ancestral lineages. Thus, we consider models that allow more than two lineages 
to coalesce simultaneously in the gene trees, that is multiple merger coalescents, also known as A 
coalescents (Donnelly and Kurtz, 1999; Pitman, 1999; Sagitov, 1999). The concordance probabili- 
ties between multiple merger gene genealogies and a species tree of two species are investigated by 
Eldon and Degnan (2012). 

Species trees may also fail to be bifurcating due to either polytomies or hybridization events. 
Simulating gene trees from a rooted species network modeling hybridization is another application 
of hybrid-Lambda. The package ms (Hudson, 2002) can simulate gene trees within a general species 
network. However the input of ms is difficult to automate when the network is sophisticated or gen- 
erated from other software. Other simulation studies using species networks have either used a small 
number of network topologies coded individually (for example, in phylonet (Than et al., 2008) or 
have assumed that gene trees have evolved on species trees embedded within the species network 
(Holland et al., 2008; Kubatko, 2009; Meng and Kubatko, 2009). The software hybrid-Lambda will 
help automate simulation studies of hybridization allowing for a large number of species network 
topologies and allowing gene trees to evolve directly within the network. 
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Figure 1: Example of a multiple merger gene genealogy with topology 

( ( (ai ,3l2 , as) , ci) , (bi , C2 ,di) ) simulated in a species network with topology 

((((B,C)sl)hl#Hl,A)s2, (hl#Hl,D)s3)r, where HI is the probability that a lineage has its 
ancestry from its left parental population. 



1 DESCRIPTION 

The program input file for hybrid-Lambda is a character string that describes relationships be- 
tween species. Standard Newick format (Olsen, 1990) is used for inputting species trees and out- 
putting gene trees, whose interior nodes are not labelled. An extended Newick formatted string 
(Cardona et al., 2008; Huson et al., 2010) labels all internal nodes, and is used for inputting species 
networks (see Fig. 1). 



1.1 Parameters 

Hybrid-Lambda can use multiple lineages sampled from each species, then simulate either a King- 
man or a multiple merger (A) coalescent within a given species network. The coalescent is a 
continuous-time Markov process, in which times between coalescent events are independent expo- 
nential random variables with different rates. The rates are determined by a so-called coalescent 
parameter in the program that can be input via command line, or a(n) (extended) Newick format- 
ted input string with specific coalescent parameters as branch lengths. By default, the Kingman 
coalescent is used, for which a population with b lineages sampled has two lineages coalesce at rate 
^6,2 = (2) • One can choose between two different examples of a A coalescent. If the coalescent 
parameter is between and 1, then we use tp for the coalescent parameter, and the rate at 
which k out of b active ancestral lineages merge is 

A 6fe = QV(l-^) 6 - fe , V>e(0,l), (l) 

and if the coalescent parameter is between 1 and 2, then we use a for the coalescent parameter, 
and the rate of fe-mergers is 

fb\ B(k-a,b-k + a) 
Xbk= [k) B(2-a,a) ' ° e(1 ' 2) ' (2) 
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where B(-, ■) is the beta function (Schweinsberg, 2003). 

The program hybrid-Lambda assumes that the input network (tree) branch lengths are in coales- 
cent units. However, this is not essential. Coalescent units can be converted through an alternative 
input file with numbers of generation as branch lengths, then divided by its corresponding effective 
population sizes. By default, effective population sizes on all branches are assumed to be equal and 
unchanged. Users can change this parameter using the command line, or using a(n) (extended) 
Newick formatted string to specify population sizes on all branches though another input file. 

The simulation requires ultrametric species networks, i.e. equal lengths of all paths from tip to 
root. Hybrid-Lambda checks the distances in coalescent units between the root and all tip nodes 
and prints out warning messages if the ultrametric assumption is violated. 

1.2 Output 

Hybrid-Lambda outputs simulated gene trees in three different files: one contains gene trees with 
branch lengths in coalescent units, another uses the number of generations as branch lengths, and 
the third uses the number of expected mutations as branch lengths. 

Besides outputting gene tree files, hybrid-Lambda also provides several functions for analysis 
purposes: 

• user-defined random seed for simulation, 

• a frequency table of gene tree topologies, 

• a figure of the species network or tree (this function only works when DTf^X or dot is installed), 

• when gene trees are simulated from two populations, hybrid-Lambda can generate a table of 
relative frequencies of reciprocal monophyly, paraphyly, and polyphyly. 
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Mathematical and Biological Synthesis, an Institute sponsored by the National Science Foundation, 
the U.S. Department of Homeland Security, and the U.S. Department of Agriculture through NSF 
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